Latent-Variable Modeling of String Transductions with Finite-State Methods
نویسندگان
چکیده
String-to-string transduction is a central problem in computational linguistics and natural language processing. It occurs in tasks as diverse as name transliteration, spelling correction, pronunciation modeling and inflectional morphology. We present a conditional loglinear model for string-to-string transduction, which employs overlapping features over latent alignment sequences, and which learns latent classes and latent string pair regions from incomplete training data. We evaluate our approach on morphological tasks and demonstrate that latent variables can dramatically improve results, even when trained on small data sets. On the task of generating morphological forms, we outperform a baseline method reducing the error rate by up to 48%. On a lemmatization task, we reduce the error rates in Wicentowski (2002) by 38–92%.
منابع مشابه
Expressiveness of streaming string transducers
Streaming string transducers [1] define (partial) functions from input strings to output strings. A streaming string transducer makes a single pass through the input string and uses a finite set of variables that range over strings from the output alphabet. At every step, the transducer processes an input symbol, and updates all the variables in parallel using assignments whose right-hand-sides...
متن کاملFinitary Compositions of Two-way Finite-State Transductions
The hierarchy of arbitrary compositions of two-way nondeterministic finite-state transductions collapses when restricted to finitary transductions, i.e., transductions that produce a finite set of outputs for each input. The hierarchy collapses to the class of nondeterministic MSO definable transductions, which is inside the second level of that hierarchy. It is decidable whether a composition ...
متن کاملInternship report - Streaming String Transducers
In formal language theory, two very different models sometimes turn out to describe the same class of languages. This usually shows that there is a fundamental concept described by those models. A well-known example is the class of regular languages, which can be characterized by logic (monadic second order (MSO) logic), algebra (syntactic monoids), and many computational models (automata). In ...
متن کامل30 th International Conference on Foundations of Software
Streaming string transducers [1] define (partial) functions from input strings to output strings.A streaming string transducer makes a single pass through the input string and uses a finiteset of variables that range over strings from the output alphabet. At every step, the transducerprocesses an input symbol, and updates all the variables in parallel using assignments whoserigh...
متن کاملLinear Transduction Grammars and Zipper Finite-State Transducers
We examine how the recently explored class of linear transductions relates to finite-state models. Linear transductions have been neglected historically, but gainined recent interest in statistical machine translation modeling, due to empirical studies demonstrating that their attractive balance of generative capacity and complexity characteristics lead to improved accuracy and speed in learnin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008